Non Gaussian Likelihoods

Gaussian Process Summer School, Melbourne, Australia

25th-27th February 2015

Ricardo Andrade Pacheco, Max Zwiessele, James Hensman and Neil Lawrence

In this lab we are going to consider approaches to dealing with non-Gaussian likelihoods.



In [1]:

    
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import GPy
import pods
from IPython.display import display



In [45]:

    
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection
import cPickle as pickle
import urllib

urllib.urlretrieve('http://staffwww.dcs.sheffield.ac.uk/people/M.Zwiessele/gpss/lab2/EastTimor.pickle', 'EastTimor2.pickle')

#Load the data
with open("./EastTimor2.pickle","rb") as f:
    X,y,polygons = pickle.load(f)

Now we will create a map of East Timor and, using GPy, plot the data on top of it. A classification model can be defined in a similar way to the regression model, but now using GPy.models.GPClassification. However, once we've define the model, we also need to update the approximation to the likelihood. This runs the Expectation propagation updates.



In [46]:

    
print y.shape
print X.shape









    



(67, 1)
(67, 2)



In [47]:

    
#Visualize a map of East-Timor
fig = plt.figure()
ax = fig.add_subplot(111)
for p in polygons:
    ax.add_collection(PatchCollection([Polygon(p)],facecolor="#F4A460"))
ax.set_xlim(124.,127.5)
ax.set_ylim(-9.6,-8.1)
ax.set_xlabel("longitude")
ax.set_ylabel("latitude")

#Define the model
kern = GPy.kern.RBF(2)
model = GPy.models.GPClassification(X,y, kernel=kern)
display(model)
model.plot(ax=ax)









    







Model: gp_classification

Log-likelihood: -115.594679675

Number of Parameters: 2





  gp_classification.
  Value
  Constraint
  Prior
  Tied to

rbf.variance        1.0    +ve                
rbf.lengthscale     1.0    +ve                







    Out[47]:





{'contour': <matplotlib.contour.QuadContourSet instance at 0x11104ed88>,
 'dataplot': <matplotlib.collections.PathCollection at 0x116438dd0>}

The decision boundary should be quite poor! However we haven't optimized the model. Try the following:



In [48]:

    
model.randomize()
model.optimize('bfgs')

print 'Log likelihood={}'.format(model.log_likelihood())
display(model)

fig = plt.figure()
ax = fig.add_subplot(111)
for p in polygons:
    ax.add_collection(PatchCollection([Polygon(p)],facecolor="#F4A460"))
ax.set_xlim(124.,127.5)
ax.set_ylim(-9.6,-8.1)
ax.set_xlabel("longitude")
ax.set_ylabel("latitude")
model.plot(ax=ax)









    



Log likelihood=-107.199692241






    







Model: gp_classification

Log-likelihood: -107.199692241

Number of Parameters: 2





  gp_classification.
  Value
  Constraint
  Prior
  Tied to

rbf.variance       1.66833160644    +ve                
rbf.lengthscale   0.316603830844    +ve                







    Out[48]:





{'contour': <matplotlib.contour.QuadContourSet instance at 0x1160957a0>,
 'dataplot': <matplotlib.collections.PathCollection at 0x11609d9d0>}

The optimization is based on the likelihood approximation that was made after we constructed the model. However, because we've now changed the model parameters the quality of that approximation has now probably deteriorated. To improve the model we should iterate between updating the Expectation propagation approximation and optimizing the model parameters.

Exercise 1

Write a for loop to optimize the model by iterating between EP and kernel parameters optimization. What happens with the decision boundary after these iterations?



In [50]:

    
# Exercise 1 answer here

Robust Regression: A Running Example

Yesterday we considered the olympic marathon data. In 1904 we noted there was an outlier example. Today we'll see if we can deal with that outlier by considering a non-Gaussian likelihood. Noise sampled from a Student-$t$ density is heavier tailed than that sampled from a Gaussian. However, it cannot be trivially assimilated into the Gaussian process. Below we use the Laplace approximation to incorporate this noise model.



In [2]:

    
# redownload the marathon data from yesterday and plot
data = pods.datasets.olympic_marathon_men()
X = data['X']
Y = data['Y']

plt.plot(X, Y, 'bx')
plt.xlabel('year')
plt.ylabel('marathon pace min/km')









    Out[2]:





<matplotlib.text.Text at 0x10ee6b810>



In [4]:

    
# make a student t likelihood with standard parameters
t_distribution = GPy.likelihoods.StudentT(deg_free=5, sigma2=2)
laplace = GPy.inference.latent_function_inference.Laplace()

kern = GPy.kern.RBF(1, lengthscale=50) + GPy.kern.Matern52(1, lengthscale=10) + GPy.kern.Bias(1)
model = GPy.core.GP(X, Y, kernel=kern, inference_method=laplace, likelihood=t_distribution)
model.constrain_positive('t_noise')

model.optimize()
model.plot()
display(model)









    



WARNING: reconstraining parameters gp






    







Model: gp

Log-likelihood: -0.0708404207615

Number of Parameters: 7

Updates: True





  gp.
  Value
  Constraint
  Prior
  Tied to

add.rbf.variance        1.23411728029    +ve                
add.rbf.lengthscale     50.2301885592    +ve                
add.Mat52.variance   0.00209233702706    +ve                
add.Mat52.lengthscale    11.0234048284    +ve                
add.bias.variance       4.31831728884    +ve                
Student_T.t_scale2     0.011674458475    +ve                
Student_T.deg_free                5.0    +ve



In [ ]:

Exercise 2

Compare this model with a regression model using a Gaussian likelihood. What difference do you notice in the predictions? Which model do you think is better?



In [53]:

    
# Exercise 2 answer

gp_classification.	Value	Constraint	Prior	Tied to
rbf.variance	1.66833160644	+ve
rbf.lengthscale	0.316603830844	+ve

gp.	Value	Constraint
add.rbf.variance	1.23411728029	+ve
add.rbf.lengthscale	50.2301885592	+ve
add.Mat52.variance	0.00209233702706	+ve
add.Mat52.lengthscale	11.0234048284	+ve
add.bias.variance	4.31831728884	+ve
Student_T.t_scale2	0.011674458475	+ve
Student_T.deg_free	5.0	+ve